AITopics | conditional mi

Collaborating Authors

conditional mi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Sordoni, Alessandro, Dziri, Nouha, Schulz, Hannes, Gordon, Geoff, Bachman, Phil, Tachet, Remi

arXiv.org Artificial IntelligenceJun-24-2021

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

information, proc, representation, (14 more...)

arXiv.org Artificial Intelligence

2106.13401

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies

Zhang, Tianyi, Hashimoto, Tatsunori

arXiv.org Artificial IntelligenceApr-12-2021

We study how masking and predicting tokens in an unsupervised fashion can give rise to linguistic structures and downstream performance gains. Recent theories have suggested that pretrained language models acquire useful inductive biases through masks that implicitly act as cloze reductions for downstream tasks. While appealing, we show that the success of the random masking strategy used in practice cannot be explained by such cloze-like masks alone. We construct cloze-like masks using task-specific lexicons for three different classification datasets and show that the majority of pretrained performance gains come from generic masks that are not associated with the lexicon. To explain the empirical success of these generic masks, we demonstrate a correspondence between the Masked Language Model (MLM) objective and existing methods for learning statistical dependencies in graphical models. Using this, we derive a method for extracting these learned statistical dependencies in MLMs and show that these dependencies encode useful inductive biases in the form of syntactic structures. In an unsupervised parsing evaluation, simply forming a minimum spanning tree on the implied statistical dependence structure outperforms a classic method for unsupervised parsing (58.74 vs. 55.91 UUAS).

conditional mi, dependency, mlm, (13 more...)

arXiv.org Artificial Intelligence

2104.05694

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

An Information-Theoretic Approach to Explainable Machine Learning

Jung, Alexander

arXiv.org Machine LearningMar-1-2020

A key obstacle to the successful deployment of machine learning (ML) methods to important application domains is the (lack of) explainability of predictions. Explainable ML is challenging since explanations must be tailored (personalized) to individual users with varying backgrounds. On one extreme, users can have received graduate level education in machine learning while on the other extreme, users might have no formal education in linear algebra. Linear regression with few features might be perfectly interpretable for the first group but must be considered a black-box for the latter. Using a simple probabilistic model for the predictions and user knowledge, we formalize explainable ML using information theory. Providing an explanation is then considered as the task of reducing the "surprise" incurred by a prediction. Moreover, the effect of an explanation is measured by the conditional mutual information between the explanation and prediction, given the user background.

explainable ml, explanation, prediction, (16 more...)

arXiv.org Machine Learning

2003.00484

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Law (0.94)
Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Yang, Howard Hua, Moody, John

Neural Information Processing SystemsDec-31-2000

Visualization of input data and feature selection are intimately related. A good feature selection algorithm can identify meaningful coordinate projections for low dimensional data visualization. Conversely, a good visualization technique can suggest meaningful features to include in a model. Input variable selection is the most important step in the model selection process. Given a target variable, a set of input variables can be selected as explanatory variables by some prior knowledge.

conditional mi, information, mutual information, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Colorado (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Yang, Howard Hua, Moody, John

Neural Information Processing SystemsDec-31-2000

conditional mi, information, mutual information, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Colorado (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Yang, Howard Hua, Moody, John

Neural Information Processing SystemsDec-31-2000

Visualization of input data and feature selection are intimately related. A good feature selection algorithm can identify meaningful coordinate projections for low dimensional data visualization. Conversely, a good visualization technique can suggest meaningfulfeatures to include in a model. Input variable selection is the most important step in the model selection process. Given a target variable, a set of input variables can be selected as explanatory variables by some prior knowledge.

artificial intelligence, machine learning, mutual information, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback